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Abstract 

We develop a new technique for proving cell-probe lower bounds for static data structures. 
Previous lower bounds used a reduction to communication games, which was known not to be 
tight by counting arguments. We give the first lower bound for an explicit problem which breaks 
this communication complexity barrier. In addition, our bounds give the first separation between 
polynomial and near linear space. Such a separation is inherently impossible by communication 
complexity. 

Using our lower bound technique and new upper bound constructions, we obtain tight bounds 
for searching predecessors among a static set of integers. Given a set Y oi n integers of £ bits 
each, the goal is to efficiently find PREDECESSDR(a:;) = max {y G Y \ y < x}. For this purpose, 
we represent Y on a RAM with word length w using S words of space. Defining a = Ig ^ + Ig w, 
we show that the optimal search time is, up to constant factors: 

^ a 



mm < 



1k4 



In external memory (w > £), it follows that the optimal strategy is to use either standard B- 
trees, or a RAM algorithm ignoring the larger block size. In the important case of ui = ^ = 7 Ig n. 



for 7 > 1 (i.e. polynomial universes), and near linear space (such as S* = ri-lg 



0(i) 



the optimal 



search time is Q(\g£). Thus, our lower bound implies the surprising conclusion that van Emde 
Boas' classic data structure from [FOCS'75] is optimal in this case. Note that for space n^~^^ , a 
running time of 0(lg£/lglg^) was given by Beame and Fich [STOC'99]. 



*An extended abstract of this paper appears in the Proceedings of the 38th ACM Symposium on Theory of 
Computing (STOC'06). 



1 Introduction 



In this paper we provide tight trade-offs between query time and space of representation for static 
predecessor search. This is one of the most basic data structures, and the trade-off gives the first 
separation between hnear and polynomial space for any data structure problem. 

1.1 The Complexity-Theoretic View 

Yao's cell-probe model [21] is typically the model of choice for proving lower bounds on data 
structures. The model assumes the memory is organized in w-hit cells (alternatively called words). 
In the case of static data structures, one first constructs a representation of the input in a table 
with a bounded number of cells S (the space complexity). Then, a query can be answered by 
probing certain cells. The time complexity T is defined to be the number of cell probes. The model 
allows free nonuniform computation for both constructing the input representation, and for the 
query algorithm. Thus, the model is stronger than the word RAM or its variants, which are used 
for upper bounds, implementable in a programming language like C. In keeping with the standard 
assumptions on the upper bound side, we only consider w = i7(lgn). 

Typically, lower bounds in this model are proved by considering a two-party communication 
game. Assume Bob holds the data structure's input, while Alice holds the query. By simulating 
the cell-probe solution, one can obtain a protocol with T rounds, in which Alice sends Ig S bits and 
Bob replies with w bits per round. Thus, a lower bound on the number of rounds translates into a 
cell-probe lower bound. 

Intuitively, we do not expect this relation between cell-probe and communication complexity to 
be tight. In the communication model. Bob can remember past communication, and answer new 
queries based on this. Needless to say, if Bob is just a table of cells, he cannot remember anything, 
and his responses must be a function of Alice's last message (i.e. the address of the cell probe). 
By counting arguments, it can be shown [12] that the cell-probe complexity can be much higher 
than the communication complexity, for natural ranges of parameters. However, a separation for 
an explicit problem has only been obtained in a very restricted setting. Gal and Miltersen [11] 
showed such a separation when the space complexity is very close to minimum: given an input of 
n cells, the space used by the data structure is n -|- o{n). 

Besides the reduction to communication complexity, and the approach of [11] for very small 
space, there are no known techniques applicable to static cell-probe complexity with cells of VL{\gn) 
bits. In particular, we note that the large body of work initiated by Fredman and Saks [9] only 
applies to dynamic problems, such as maintaining partial sums or connectivity. In the case of static 
complexity, there are a few other approaches developed specifically for the bit-probe model {w = 1); 
see [14]. 

In conclusion, known lower bound techniques for cell-probe complexity cannot surpass the 
communication barrier. However, one could still hope that communication bounds are interesting 
enough for natural data structure problems. Unfortunately, this is often not the case. Notice that 
polynomial differences in S only translate into constant factors in Alice's message size. In the 
communication game model, this can only change constant factors in the number of rounds, since 
Alice can break a longer message into a few separate messages. Unfortunately, this means that 
communication complexity cannot be used to separate, say, polynomial and linear space. For many 
natural data-structure problems, the most interesting behavior occurs close to linear space, so it is 
not surprising that our understanding of static data-structure problems is rather limited. 
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In this work, wc develop a new lower-bound teehnique, the cell-probe elimination lemma, tar- 
geted specifically at the cell-probe model. Using this lemma, we obtain a separation between space 
j^i+o(i) space n^^^ for any e > 0. This also represents a separation between communication 
complexity and cell-probe complexity with space 'n}~^°^^\ Our lower bounds hold for predecessor 
search, one of the most natural and well-studied problems. 

Our lower bound result has a strong direct sum flavor, which is interesting in its own right. 
Essentially, we show that for problems with a certain structure, a data structure solving k indepen- 
dent subproblems with space k ■ a cannot do better than k data structures solving each problem 
with space a. 

1.2 The Data-Structural View 

Using our lower bound technique and new upper bound constructions, we obtain tight bounds for 
predecessor search. The problem is to represent an ordered set Y, such that for any query x we 
can find efficiently predecessor(x) = max {y £ Y \ y < x}. This is one of the most fundamental 
and well-studied problems in data structures. For a comprehensive list of references, we refer to [4]; 
here, we only describe briefly the best known bounds. 

1.2.1 The Upper-Bound Story 

We focus on the static case, where Y is given in advance for preprocessing. For example, we can sort 
Y, and later find the predecessor of x by binary search using O(lgn) comparisons, where n = |y|. 

On computers, we are particularly interested in integer keys. Thereby we also handle, say, 
floating point numbers whose ordering is preserved if they are cast as integers. We can then use all 
the instructions on integers available in a standard programming language such as C, and we are 
no longer limited by the Q{lgn) comparison based lower bound for searching. A strong motivation 
for considering integer keys is that integer predecessor search is asymptotically equivalent to the 
IP look-up problem for forwarding packets on the Internet [7]. This problem is extremely relevant 
from a practical perspective. The fastest deployed software solutions use non-comparison-based 
RAM tricks [6]. 

More formally, we will represent Y on a unit-cost word RAM with a given word length w. We 

assume each integers in Y has i bits, and that \gn < i < w. On the RAM, the most natural 
assumption is i = w. The case w > i models the external memory model with B = keys per 
page. In this case, the well-known (comparison-based) B-trees achieve a search time of 0{logQ n). 
For the rest of the discussion, assume w = I. 

Using the classic data structure of van Emde Boas [19] from 1975, we can represent our integers 
so that predecessors can be searched in O(lg^) time. The space is linear if we use hashing [20]. 

In the 1990, Predman and Willard [10] introduced fusion trees, which requires linear space and 
can answer queries in O(log^n) time. Combining with van Emde Boas' data structure, they got a 
search time of 0(min{^S^, Ig^}), which is always 0{^/\gn). 

In 1999, Beame and Fich [4] found an improvement to van Emde Boas' data structure bring- 
ing the search time down to O(iif^). Combined with fusion trees, this gave them a bound of 

ig ig t 

0(min{^, igfg^ }), which is always 0{^J^^^). However, the new data structure of Beame and 
Fich uses quadratic space, and they asked if the space could be improved to linear or near-linear. 
As a partially affirmative answer to this question, we show that their O(M^) search time can 



2 



be obtained with space n^+^/ ^^Pi^s.^ for any e > 0. However, we also show, as our main result, 
that with closer to linear space, such as n Ig*^*^^^ n, one cannot in general improve the old van Emde 
Boas bound of 0{lg£). 



1.2.2 The Lower-Bound Story 

Ajtai [1] was the first to prove a superconstant lower bound for our problem. His results, with a 
correction by Miltersen [13], can be interpreted as saying that there exists n as a function of I such 
that the time complexity for polynomial space is ^(^/lg£), and likewise there exists i a function of 
n making the time complexity ^(-^ign). 

Miltersen [13] revisited Ajtai's work, showing that the lower bound holds in the communication 
game model, and for a simpler colored predecessor problem. In this problem, the elements of Y 
have an associated color (say, red or blue), and the query asks only for the color of the predecessor 
in Y. This distinction is important, as one can reduce other problems to this simpler problem, 
such as existential range queries in two dimensions [15] or prefix problems in a certain class of 
monoids [13]. Like previous lower bound proofs, ours also holds for the colored problem, making 
the lower bounds applicable to these problems. 

Miltersen, Nisan, Safra and Wigderson [15] once again revisited Ajtai's proof, extending it to 
randomized algorithms. More importantly, they captured the essence of the proof in an independent 
round elimination lemma, which forms a general tool for proving communication lower bounds. Our 
cell-probe elimination lemma is inspired, at a high level, by this result. 

Beame and Fich [4] improved the lower bounds to ^( jgfg^ ) and respectively. Sen 

and Venkatesh [16] later gave an improved round elimination lemma, which can reprove the lower 
bounds of Beame and Fich, but also for randomized algorithms. Analyzing the time-space trade- 
offs obtained by these proofs, one obtains ^^(^, \g\gs )^ where S is the space bound, and possibly 
w > i. 



1.3 The Optimal Trade-Offs 



Define Igx = [log2(a; + 2)], so that Igx > 1 even if x G [0,1]. Assuming space S, and defining 
a = \g^ + lgw, we show that the optimal search time is, up to constant factors: 

log^n 



mm < 



(1) 



The upper bounds arc achieved by a deterministic query algorithm on a RAM. The data struc- 
ture can be constructed in expected time 0{S) by a randomized algorithm, starting from a sorted 
list of integers. The lower bounds hold for deterministic query algorithms answering the colored 
predecessor problem in the cell-probe model. When S > n^~^^ for some constant e > 0, the lower 
boTinds also hold in the stronger communication game model, even allowing randomization with 
two-sided error. 
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1.3.1 External Memory and Branch One 

To understand the first branch of the trade-off, first consider the typical case on a RAM, when 
a word fits exactly one integer, i.e. w = i. In this case, the bound is log^ n, which describes the 
performance of fusion trees [10]. 

To understand the case w > i, consider the external memory model with B words per page. This 
model has as a nonuniform counterpart the cell-probe model with cells of size w = Bi. Observe 
that only the first branch of our tradc-off depends on w. This branch is log^ n = £ = 

G(min{Iog^ n, log^ n}). The first term describes the performance of fusion trees on a RAM with i- 
bit words, as noted above. The second term matches the performance of the B-tree, the fundamental 
data structure in external memory. 

Thus, we show that it is always optimal to either use a standard B-tree, or the best RAM 
algorithm which completely ignores the benefits of external memory. The RAM algorithm uses 
£-hit words, and ignores the grouping of words into pages; this algorithm is the best of fusion trees 
and the algorithms from branches 2-4 of the trade-off. Thus, the standard comparison-based B-tree 
is the optimal use of external memory, even in a strong model of computation. 

1.3.2 Polynomial Universes: Branch Two 

For the rest of the discussion, assume the first branch (B-trees and fusion trees) does not give 
the minimum. Some of the most interesting consequences of our results can be seen in the very 
important special case when integers come from a polynomial universe, i.e. i = O(lgn). In this 
case, the optimal complexity is 0(lg ^~)f ) , as given by the second branch of the trade-off. 

On the upper bound side, this is achieved by a simple elaboration of van Emdc Boas' data 
structure. This data structure gives a way to reduce the key length from £ to | in constant time, 
which immediately implies an upper bound of O(lg^). To improve that, first note that when £ < a, 
we can stop the recursion and use complete tabulation to find the result. This means only 0(lg |) 
steps are needed. Another trivial idea, useful for near-linear universes, is to start with a table 
lookup based on the first Ig n bits of the key, which requires linear space. Then, continue to apply 
van Emde Boas for keys of u; — Ign bits inside each subproblem, giving a complexity of 0(lg "'~^^" ). 

Quite surprisingly, our lower bound shows that van Emde Boas' classic data structure, with 
these trivial tweaks, is optimal. In particular, when the space is not too far from linear (at most 
n • 2^^ and > (1 + e) Ign, the standard van Emde Boas bound of Q{lg£) is optimal. It was 
often conjectured that this bound could be improved. 

Note that with space n.^^^, the optimal complexity for polynomial universes is constant. How- 
ever, with space n^~^"^^\ the bound is showing the claimed complexity-theoretic separations. 

1.3.3 The Last Two Branches 

The last two branches are relevant for superpolynomial universes, i.e. £ = a;(lgn). Comparing the 
two branches, we see the third one is better than the last one (up to constants) when a = J7(lgn). 
On the other hand, the last branch can be asymptotically better when a = o(lgn). This bound has 
the advantage that in the logarithm in the denominator, the factor which is subconstant for 
a = o(lgn), is replaced by 1/lg 

The third branch is obtained by a careful application of the techniques of Beame and Fich [4] , 
which can improve over van Emde Boas, but need large space. The last branch is also based on 
these techniques, combined with novel approaches tailored for small space. 
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1.3.4 Dynamic Updates 

Lower bounds for near-linear space easily translate into interesting lower bounds for dynamic prob- 
lems. If inserting an element takes time t^, we can obtain a static data structure using space 0{n-ty^) 
by simply simulating n inserts and storing the modified cells in a hash table. This transformation 
works even if updates are randomized, but, as before, we require that queries be deterministic. 
This model of randomized updates and deterministic queries is standard for hashing-based data 
structures. By the discussion above, as long as updates are reasonably fast, one cannot in general 
improve on the O(lg^) query time. It should be noted that van Emdc Boas data structure can 
handle updates in the same time as queries, so this classic data structure is also optimal in the 
typical dynamic case, when one is concerned with the slowest operation. 

1.4 Contributions 

Wc now discuss our contributions in establishing the tight results of (1). Our main result is proving 
the tight lower bounds for a = o(lgn) (in particular, branches two and four of the trade-off). As 
mentioned already, previous techniques were helpless, since none could even differentiate a = 2 
from a = Ig 77,. 

Interestingly, we also show improved lower bounds for the case a = Q{lgn), in the classic 
communication framework. These improvements are relevant to the third branch of the trade-off. 
Assuming for simplicity that a < ■w^~^ , our bound is minjjf^, ig ig w-Ho"{a/ ig n) J' ' whereas the best 
previous lower bound was min { ^^}- Our improved bound is based on a simple, yet interesting 
twist: instead of using the round elimination lemma alone, we show how to combine it with the 
message compression lemma of Chakrabarti and Regev [5] . Message compression is a refinement of 
round elimination, introduced by [5] to prove a lower bound for the approximate nearest neighbor 
problem. Sen and Venkatesh [16] asked whether message compression is really needed, or one could 
just use standard round elimination. Our result sheds an interesting light on this issue, as it shows 
message compression is even useful for classic predecessor lower bounds. 

On the upper bound side, we only need to show the last two branches of the trade-off. As 
mentioned already, we use techniques of Beame and Fich [4]. The third bound was anticipated^ by 
the second author in the concluding remarks of [18]. The last branch of (1), tailored specifically 
for small space, is based on novel ideas. 

1.5 Direct-Sum Interpretations 

A very strong consequence of our proofs is the idea that sharing between subproblems does not 
help for predecessor search. Formally, the best cell-probe complexity achievable by a data structure 
representing k independent subproblems (with the same parameters) in space k-a is asymptotically 
equal to the best complexity achievable by a data structure for one subproblem, which uses space 
a. The simplicity and strength of this statement make it interesting from both the data-structural 
and complexity-theoretic perspectives. 

At a high level, it is precisely this sort of direct-sum property that enables us to beat commu- 
nication complexity. Say we have k independent subproblems, and total space S. While in the 
communication game Alice sends Ig S bits per round, our results intuitively state that Ig j bits are 

^As a remark in [18, Section 7.5], it is stated that "it appears that we can get the following results. . . ", followed 
by bounds equivalent to the third branch of (1). 
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sufficient. Tfien, by carefully controlling the increase in k and the decrease in key length (the query 
size), we can prevent Alice from communicating her entire input over a superconstant number of 
rounds. 

A nice illustration of the strength of our result are the tight bounds for near linear universes, 
i.e. i = Ig n + 5, with 6 = o(lg n) . On the upper bound side, the algorithm can just start by a table 
lookup based on the first Ign bits of the key, which requires linear space. Then, it continues to 
apply van Emde Boas for (5-bit keys inside each subproblem, which gives a complexity of 0(lg|). 
Obtaining a lower bound is just as easy, given our techniques. We first consider independent 
subproblems, where each has 2^ integers of 26 bits each. Then, we prefix the integers in each 
subproblem by the number of the subproblem (taking Ign — 6 bits), and prefix the query with a 
random subproblem number. Because the universe of each subproblem (2^^) is quadratically bigger 
than the number of keys, we can apply the usual proof showing the optimality of van Emde Boas' 
bound for polynomial universes. Thus, the complexity is 0(lg |). 

2 Lower Bounds for Small Space 
2.1 The Cell-Probe Elimination Lemma 

An abstract decision data structure problem is defined by a function f : D x Q ^ {0,1}. An input 
from D is given at preprocessing time, and the data structure must store a representation of it in 
some bounded space. An input from Q is given at query time, and the function of the two inputs 
must be computed through cell probes. We restrict the preprocessing and query algorithms to be 
deterministic. In general, we consider a problem in conjunction with a distribution D over D x Q. 
Note that the distribution need not (and, in our case, will not) be a product distribution. We care 
about the probability the query algorithm is successful under the distribution V (for a notion of 
success to be defined shortly). 

As mentioned before, we work in the cell-probe model, and let w be the number of bits in a cell. 
We assume the query's input consists of at most w bits, and that the space bound is at most 2'^. For 
the sake of an inductive argument, we extend the cell-probe model by allowing the data structure 
to publish some bits at preprocessing time. These are bits depending on the data structure's input, 
which the query algorithm can inspect at no charge. Closely related to this concept is our model for 
a query being successful. We allow the query algorithm not to return the correct answer, but only 
in the following very limited way. After inspecting the query and the published bits, the algorithm 
can declare that it cannot answer the query (we say it rejects the query). Otherwise, the algorithm 
can make cell probes, and at the end it must answer the query correctly. Thus, we require an a 
priori admission of any "error". In contrast to models of silent error, it actually makes sense to 
talk about tiny (close to zero) probabilities of success, even for problems with boolean output. 

For an arbitrary problem / and an integer k < 2^, we define a direct-sum problem / : 

X {[k] X Q) — > {0, 1} as follows. The data structure receives a vector of inputs {d^, . . . , d^). The 
representation depends arbitrarily on all of these inputs. The query is the index of a subproblem 
i e [fc], and an element q ^ Q. The outpTit of f is f{q,d^). We also define a distribution 

T> for /, given a distribution T> for /. Each is chosen independently at random from the 
marginal distribution on D induced by P. The subproblem i is chosen uniformly from [k], and q is 
chosen from the distribution on Q conditioned on c?*. 

Given an arbitrary problem / and an integer h < w, we can define another problem f^^^ as 
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follows. The query is a vector {qi, . . . ,qh)- The data structure receives a regular input d £ D, and 
integer r G [h] and the prefix of the query qi, . . . , Qr-i- The output of f^^^ is f{d, qr). Note that we 
have shared information between the data structure and the querier (i.e. the prefix of the query), 
so is a partial function on the domain D x IJi=o ^ Q- we define an input distribution 
J)W for f^^\ given an input distribution V for /. The value r is chosen uniformly at random. 
Each query coordinate qi is chosen independently at random from the marginal distribution on Q 
induced by P. Now d is chosen from the distribution on D, conditioned on qr. 

We give the f^^^ operator precedence over the direct sum operator, i.e. /^'*^ means [Z*^'*-'] • 
Using this notation, we are ready to state our central cell-probe elimination lemma: 

Lemma 1. There exists a universal constant C, such that for any problem f, distribution T>, and 

positive integers h and, k, the following holds. Assume there exists a solution to 0^^ /^^^ with success 
probability 6 over 0''p('^), which uses at most ka words of space, 7j{ji)^k published bits and T cell 
probes. Then, there exists a solution to 0^^ / with success probability ^ over 0^^ V, which uses 
the same space, k -l/a ■ CuP' published bits and T — 1 cell probes. 

2.2 Setup for the Predecessor Problem 

Let P(n, tj be the colored predecessor problem on n integers of ^ bits each. Remember that this 

is the decision version of predecessor search, where elements are colored red or blue, and a query 
just returns the color of the predecessor. We first show how to identify the structure of P{n,tj^^^ 
inside P{n, hi), making it possible to apply our cell-probe elimination lemma. 

Lemma 2. For any integers n,£,h > 1 and distribution V for P{n,i), there exists a distribution 
D*{h) P{n, hi) such that the following holds. Given a solution to 0^^ P{n, hi) with success 
probability 6 over 

0fc-p*(h)^ one can obtain a solution to P{n,£)^^^ with success probability S 
over 0^"©^, which has the same complexity in terms of space, published bits, and cell probes. 

Proof. We give a reduction from P{n,£)^^^ to P{n,h£), which naturally defines the distribution 
P*W in terms of P^'^). A query for Pin, l)^ ^ consists of x\, . . . , xji G {0, 1}^. Concatenating these, 
we obtain a query for P{n,hl). In the case of P{n,i)^^\ the data structure receives i G [h], the 
query prefix xi, . . . , Xi-i and a set Y of ^-bit integers. We prepend the query prefix to all integers 
in Y, and append zeros up to hi bits. Then, finding the predecessor of Xi in Y is equivalent to 
finding the predecessor of the concatenation oi xi, . . . ,Xh in this new set. □ 

Observe that to apply the cell-probe elimination lemma, the number of published bits must be 
just a fraction of k, but applying the lemma increases the published bits significantly. We want 
to repeatedly eliminate cell probes, so we need to amplify the number of subproblems each time, 
making the new number of published bits insignificant compared to the new k. 

Lemma 3. For any integers t,l,n > 1 and distribution T> for P{n,l), there exists a distribution 
D*t Jqj. p^j^ ■ t,l + \gt) such that the following holds. Given a solution to P{n ■ t, I + \gt) 
with success probability 5 over 0*^1'**, one can construct a solution to P{n,l) with success 
probability 5 over 0^^* V, which has the same complexity in terms of space, published bits, and cell 
probes. 

Proof. We first describe the distribution P**. Wc draw Yi, . . . ,Yf independently from P, where 1^ 
is a set of integers, representing the data structures input. Prefix all numbers in Yj by j using Igt 
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bits, and take the union of all these sets to form the data structure's input for P{nt,i + Igt). To 
obtain the query, pick j £ {0, . . . , t — 1} uniformly at random, pick the query from T> conditioned on 
Yj, and prefix this query by j. Now note that ^''^V and ^^V*^ are really the same distribution, 
except that the lower Ig t bits of the problems index for 0^^* V are interpreted as a prefix in 0*^ D**. 
Thus, obtaining the new solution is simply a syntactic transformation. □ 

Our goal is to eliminate all cell probes, and then reach a contradiction. For this, we need the 
following impossibility result for a solution making zero cell probes: 

Lemma 4. For any n> 1 and £ > log2(ra + 1), there exists a distribution V for P{n,£) such that 
the following holds. For all (V)0 < S < 1 and k > 1, there does not exist a solution to P{n,t) 
with success probability 5 over 0*^P, which uses no cell probes and less than 5k published bits. 

Proof. The distribution T> is quite simple: the integers in the set are always up to n — 1, and 

the query is n. All that matters is the color of n — 1, which is chosen uniformly at random among 
red and blue. Note that for 0^^ P{n,£) there are only k possible queries, i.e. only the index of the 
subproblem matters. 

Let p be the random variable denoting the published bits. Since there are no cell probes, the 
answers to the queries are a function of p alone. Let S{p) be the fraction of subproblems that 
the query algorithm doesn't reject when seeing the published bits p. In our model, the answer 
must be correct for all these subproblems. Then, Pr[p = p] < 2~^^^^, as only inputs which 
agree with the 6{p)k answers of the algorithm can lead to these published bits. Now observe that 
5 = 'Ep[S{p)] < Ep ^ log2 pj-[p=p] = ^H{p), where H{-) denotes binary entropy. Since the entropy 
of the published bits is bounded by their number (less than 6k), we have a contradiction. □ 



2.3 Showing Predecessor Lower Bounds 

Our proof starts assuming that we for any possible distribution have a solution to P{n,i) which 
uses n ■ 2" space, no published bits, and successfully answers all queries in T probes, where T is 
small. We will then try to apply T rounds of the cell-probe elimination from Lemma 1 and 2 
followed by the problem amplification from Lemma 3. After T rounds, we will be left with a non- 
trivial problem but no cell probes, and then we will reach a contradiction with Lemma 4. Below, 
we first run this strategy ignoring details about the distribution, but analyzing the parameters for 
each round. Later in Lemma 5, we will present a formal inductive proof using these parameters in 
reverse order, deriving difficult distributions for more and more cell probes. 

We denote the problem parameters after i rounds by a subscript i. We have the key length ii 
and the number of subproblems ki. The total number of keys remains n, so the have n/ki keys in 
each subproblem. Thus, the problem we deal with in round i -|- 1 is 0*^* P{^,ii), and we will have 
some target success probability Si. The number of cells per subproblem is CTj = ^2". We start the 
first round with Iq = £, = 1, A;o = 1 and (Tq = n • 2". 

For the cell probe elimination in Lemma 1 and 2, our proof will use the same value oi h > 2 
in all rounds. Then > so (5i > (4/t)~\ To analyze the evolution of 4 and ki, we let ti be 
the factor by which we increase the number of subproblems in round i when applying the problem 
amplification from Lemma 3. We now have = ti • hi and ^j+i = ^ — Ig^i- 

When we start the first round, we have no published bits, but when we apply Lemma 1 in round 
i + 1, it leaves us with up to ki i^i ■ CvP' published bits for round i-\-2. We have to choose ti large 
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enough to guarantee that this number of pubhshed bits is small enough compared to the number 
of subproblems in round i + 2. To apply Lemma 1 in round i + 2, the number of published bits 
must be at most 7^(-7-^)^^i+i = Rif^^i- Hence wc must set tj > -y/aj • 64C^ti;^/i®(-r-)'^. Assume 



for now that T = O(lg^). Using /i < and 5i > (4/i)-^ > 2^^^^'^^ we conclude it is enough to set 



(V)z: ti> H^-2-/^-w^-2^^'^"^^ (2) 




Now we discuss the conclusion reached at the end of the T rounds. We intend to apply Lemma 4 
to deduce that the algorithm after T stages cannot make zero cell probes, implying that the original 
algorithm had to make more than T probes. Above we made sure that we after T rounds had 
^{^)^kT < Sxkr published bits, which are few enough compared to the number kx of subproblems. 
The remaining conditions of Lemma 4 are: 

> 1 and 1— > 1 (3) 
kr 

Since 4+i < ^, this condition entails T = 0{lgi), as assumed earlier. 

Lemma 5. With the above parameters satisfying (2) and (3), fori = 0, . . . ,T, there is a distribution 
Vi for PifJi) so that no solution for 0*^* PifJi) can have success probability Si over 

using n ■ 2" space, ^{^)^ki published bits, and T — i cell probes. 

Proof. The proof is by induction over T — i. A distribution that defies a good solution as in the 
lemma is called difficult. In the base case i = T, the space doesn't matter, and we get the difficult 
distribution directly from (3) and Lemma 4. Inductively, we use a difficult distribution Pj to 
construct a difficult distribution Pj-i. 

Recall that ki = ki-iti-i. Given our difficult distribution Pj, we use the problem amplification 
in Lemma 3, to construct a distribution for P{^ ■ ti-i,£i + Igtj-i) = P{-k~^,ii + ^SU-i) 

so that no solution for 0^^'"^ ^(k^^-^i + Ig^i-i) can have success probability Si over 0*^'"^ 
using n ■ 2°- space, ^(x)^^i published bits, and T — i ccU probes. 

Recall that (2) implies ki-\ i^Oi-x ■ CuP' < ^(x)'^^*' hence that ki^i ^(Tj_i is less than the 
number of bits allowed published for our difficult distribution T>,- ' . Also, recall that (Tjkj = n • 2°^ 
for all j. Wc can therefore use the cell probe elimination in Lemma 1, to construct a distribution 

(Vl^'-^y^^ for +lgti_i)W so that no solution for Pii^Ji + IgU-i)^''^ can have 

success probability Si^i > hSi over 0*^'"^ ^P**'"^^ ^ using n ■ 2" space, ^{^^)^ki-i published 

bits, and T — i + 1 cell probes. Finally, using Lemma 2, we use ( Pj ' ) to construct the desired 
difficult distribution for Pij^, h{ii + Igtj-i)) = Pii^Ji-i)- □ 

The predecessor lower bound then follows by applying Lemma 5 with i = and the initial 
parameters £o = £,So = l,ko = 1. We conclude that there is a difficult distribution Vq for P{n,i) 
with no solution getting success probability 1 using n ■ 2" space, published bits, and T cell probes. 
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2.4 Calculating the Trade-Offs 



In this section, we show how to choose h and ti in order to maximize the lower bound T, under the 
conditions of (2) and (3). First, we show a simple bound on a recursion that shows up repeatedly 
in our analysis: 

Lemma 6. Consider the recursion Xj+i > axi — 7, for 7 > 1. As long as i < logi/aC ); 
we have Xi > 1. 

Proof. Expanding the recursion, we have Xi > xoa*— 7(a'~"^+- ■ •+!) = xoa^— 7 ^"*^^ • For Xi > 1, we 
must have xoo^ > 1 + 7i^, which is true if xqQ;* > 1 + jz^- This gives i < logi/a d^^/ )■ '-' 

We now argue that the bound for low space that we are trying to prove can only be better than 

the communication complexity lower bound when Igi = 0((lglgn)^). This is relevant because our 
cell-probe elimination lemma is less than perfect in its technical details, and cannot always achieve 
the optimal bound. Fortunately, however, it does imply an optimal bound when £ is not too large, 
and in the remaining cases an optimal lower bound follows from communication complexity. 

Remember that for space O(n^), communication complexity implies an asymptotic lower bound 
of liSwfi^}- If = ms^gnf), this is e(min{||^,iis|j}). For a < Ign, we are 

trying to prove an asymptotic lower bound of min{j|^, — -^TT^^TgTrr}- If Ig-^ = ^((Iglg^^)^)) this 
becomes 6(min{^^, ^^^j), which is identical to the communication bound. 

Polynomial Universes. Assume that i > Sign. We first show a lower bound of O(lg^), 
which matches van Emde Boas on polynomial universes. For this, it suffices to set /t = 2 and 
= iff^^- Then, ^ = (|)V4, so Ig f = A-^gn and IgU = ^A-^lgn. By our recursion 
for ii, we have 4+i = ^ — |4~'lgn. Given £0 = ^ > 31gn, it can be seen by induction that 
£i> 3- 4-Mgn. Indeed, 4+i > 3 • 4"* • i Ign - |4-*lgn > 3 • 4-(*+i) Ign. By the above, (3) is 
satisfied for T < 6(lglgn). Finally, note that condition (2) is equivalent to: 

lg*i > ■^lg^ + ^ + e(lgu; + lg2^) ^ ^4-Mgn > -4-Mgn + | + e(lgu; + Ig^ ^) 

^ T < e(lg min{i^,|^})=e(mm{lgi|^,lglgn})=e(lgif) 

Here we have used Igw = 0((lglgn)^), which is the regime in which our bound for small space can 
be an improvement over the communication bound. 

Handling Larger Universes. We now show how one can take advantage of a higher w to 
obtain larger lower bounds. We continue to assume w > 3 Ign. Our strategy is to use the smallest 
ti possible according to (2) and superconstant h. To analyze the recursion for 4, we just bound 
ti < n, so 4+1 > TT ~ ^g'^- Using Lemma 6, we have > 1 for T < QQ-ghi]^))- We also have the 
recursion: 

^^k-.=^H-^^^' = v-hrH-h-^^^^'"^ 
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Again by Lemma 6, we see that ^ > 1 if: 



h- i^ + lg-^ w) ' 1- j^J V a + h\g^wj V l a hlg^ w 

As mentioned before, the condition > 1 in (3) implies T = 0{lgw), so we can assume h = 
0{lgw). Remember that we are assuming Igtt; = 0((lglgn)^), so the second term in the min is 
just @{h\glgn). Then, the entire expression simphfies to e(/ilgi^). 

The lower bound we obtain is be the minimum of the bounds derived by considering £i and ki. 
We then choose h to maximize this minimum, arriving at: 

r. ( ■ (lg{w/lgn) Ign 

B max mm < — ; , n Ig 

\ h [Igh a 

Clearly, the f2(lg ^) bound derived previously still holds. Then, we can claim a lower bound 
that is the maximum of this and our new bound, or, equivalently up to constants, their sum: 

Ign . rig(u;/lgn) lgn\ . f Ign lg(u;/lgn) Ign 

Ig h maxmm < — ; , n,lg > = maxmm < Ig 1 — ; , (h + 1) Ig 

ah [Ign a } h Ig" a 

. (h{w/lgn) +\g{lgn/a) Ign) . (lg{w/a) Ign 

> maxmm < — — — ; — -,hlg >= maxmm < — - — ; — ,/ilg 

h [ Igh a } h [ign a 

We choose h to balance the two terms, so high = ig([j^°n) and Ig/i = 6(lglg ^ — Iglg Then 
the bound is ^( igig(^/if_ti^(ig^/„) )- 

Handling Smaller Universes. Finally, we consider smaller universes, i.e. w < 3 Ign. Let 
w = (5 + Ign. We start by applying Lemma 3 once, with t = Now we are looking at 

the problem ^* P(2^/^, Observe that the subproblems have a universe which is cubic in 

the number of integers in the subproblem. Then, we can just apply our strategy for polynomial 
universes, starting with = and no = 2^/^. We obtain a lower bound of r2(lg ^) = ri(lg "'"j^"' ). 



3 Proof of Cell-Probe Elimination 

We assume a solution to 0^" and use it to construct a solution to 0^ /. The new solution 
uses the query algorithm of the old solution, but skips the first cell probe made by this algorithm. 
A central component of our construction is a structural property about any query algorithm for 
0*^/^ with the input distribution ^'''D'-'^\ We now define and claim this property. Section 3.1 
uses it to construct a solution for 0^^ /, while Section 3.2 gives the proof. 

We first introduce some convenient notation. Remember that the data structure's input for 
0fe j^ih) consists of a vector {d}, . . . , d^) G , a vector selecting the interesting segments (r^, . . . , r*^) G 
[h\^ and the query prefixes for all j G [r* — 1]. Denote by d,r and Q the random variables 
giving these three components of the input. Also let p be the random variable representing the bits 
published by the data structure. Note that p can also be understood as a function p(d, r, Q). The 
query consists of an index i selecting the interesting subproblem, and a vector (gi, . . . ^Qh) with a 
query to that subproblem. Denote by i and q these random variables. Note that in our probability 
space 0^" /W, we have q^- = Qj, (V)j < r'. 
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Fix some instance p of the pubhshed bits and a subproblem index i E [k]. Consider a prefix 
(qi, . . . ,qj) for a query to this subproblem. Depending on qj+i, . . . ,qh, the query algorithm might 
begin by probing different cells, or might reject the query. Let r*(p; qi, . . . , qj) be the set of cells 
that could be inspected by the first cell probe. Note that this set could be 0, if all queries are 
rejected. 

Now define: 

. r if P(p;Q^) = 

£\P) = I r|p(p. . . . ^ q^,)| > min{a,|0>;QO|} | ^ ^ -1 otherwise 

The probability space is that defined by 0^" V^^^ when the query is to subproblem i. In 
particular, such a query will satisfy q^- = Q*-, (V)j < r', because the prefix is known to the data 
structure. Note that this definition completely ignores the suffix Qri+i, . . . , q/i of the query. The 
intuition behind this is that for any choice of the suffix, the correct answer to the query is the same, 
so this suffix can be "manufactured" at will. Indeed, an arbitrary choice of the suffix is buried in 
the definition of F*. 

With these observations, it is easier to understand (4). If the data structure knows that no 
query to subproblem i will be successful, = 0. Otherwise, we compare two sets of cells. The 
first contains the cells that the querier might probe given what the data structure knows: r*(p, Q*) 
contains all cells that could be probed for various q^i and various suffixes. The second contains 
the cells that the querier could choose to probe considering its given input q^^ (the querier is only 
free to choose the suffix). Obviously, the second set is a subset of the first. The good case, whose 
probability is measured by e^, is when it is a rather large subset, or at least large compared to a. 

For convenience, we define s*{p) = Ej^[;j] [e*(p)] = Using standard notation from 

probability theory, we write £*(p | E), when we condition on some event E in the probability of (4). 
We also write e*(p | X) when we condition on some random variable X, i.e. e*(p | X) is a function 
X i-^ e^{p \ X = x). We are now ready to state our claim, to be proven in Section 3.2. 

Lemma 7. There exist t and Q, such that Ed[£*(p(r, £2, d) | r = r, Q = 0, d)] > J^. 
3.1 The Solution for 0V 

As mentioned before, we use the solution for 0^ /'■'*\ and try to skip the first cell probe. To use 
this strategy, we need to extend an instance of 0^^ / to an instance of 0^ f^^h This is done using 
the r and Q values whose existence is guaranteed by Lemma 7. The extended data structure's input 
consists of the vector {d^,. . . , d^) given to 0*^ /, and the vectors r and 0. A query's input for 0^^ / 
is a problem index i G [k] and a q £ Q. We extend this to (gi, . . . , qh) by letting qj = 0*-, (V)j < r*, 
and q^i = q, and manufacturing a suffix g^i+i, . . . , g/i as described below. 

First note that extending an input of 0^^ / to an input of 0*^ by this strategy preserves 
the desired answer to a query (in particular, the suffix is irrelevant to the answer). Also, this 
transformation is well defined because r and are "constants", defined by the input distribution 
Since our model is nonuniform, we only care about the existence of r and 0, and not 
about computational aspects. 

To fully describe a solution to 0^^ /, wc must specify how to obtain the data structure's rep- 
resentation and the published bits, and how the query algorithm works. The data structure's 
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representation is identical to the representation for f'^^\ given the extended input. The pub- 
Ushed bits for f consist of the pubUshed bits for f^^\ plus a number of published cells from 
the data structure's representation. Which cells are published will be detailed below. We publish 
the cell address together with its contents, so that the query algorithm can tell whether a particular 
cell is available. 

The query algorithm is now simple to describe. Remember that qi, ■ ■ ■ jQr*-! prescribed by 
0, and q^i = qis the original input of /. We now iterate through all possible query suffixes. For 
each possibility, we simulate the extended query using the algorithm for 0*^ f^'^h If this algorithm 
rejects the query, or the first probed cell is not among the published cells, we continue trying 
suffixes. Otherwise, we stop, obtain the value for the first cell probe from the published cells and 
continue to simulate this query using actual cell probes. If we don't find any good suffix, we reject 
the query. It is essential that we can recognize success in the old algorithm by looking just at 
published bits. Then, searching for a suffix that would not be rejected is free, as it does not involve 
any cell probes. 



Publishing cells. It remains to describe which cells the data structure chooses to publish, in 
order to make the query algorithm successful with the desired probability. Let p be the bits 
published by the /^'^^ solution. Note that in order to make the query (i, q) successful, wc must 
publish one cell from {p; £r , q) . Here, we slightly abuse notation by letting £r,q denote the r* 
entries of the prefix 0*, followed by q. We will be able to achieve this for all (i, q) satisfying: 

r>;Q<)^0 and |r<feQS,)| > ^^^felE^SMi (5) 

Comparing to (4), this means the success probability is at least e*{p \ r = r, Q = n,d = 
{di, . . . ,dk))- Then on average over possible inputs {di, . . . ,dk) to 0^^ /, the success probabil- 
ity will be at least as guaranteed by Lemma 7. 
We will need the following standard result: 

Lemma 8. Consider a universe f7 / and a family of sets T such that (V)S' E T we have S G U 
and l^l > Then there exists asetT C U, \T\ < Bin |jr| such that {\f)S e J^,Sr]T ^ 0. 

Proof. Choose i?ln \ J^\ elements of U with replacement. For a fixed 5" G an element is outside S 
with probability at most 1 — The probability all elements are outside S is at most (1 — < 
g-in|:F| ^ -Qy ^]jg union bound, all sets in JT are hit at least once with positive probability, so 
a good T exists. □ 

We distinguish three types of subproblems, parallel to (5). If P(p;0*) = 0, we make no claim 
(the success probability can be zero). Otherwise, if |r*(p; 0*)| < a, we handle subproblem i using a 

local strategy. Consider all q such that |r*(p; £2*, g)| > - — ^ . We now apply Lemma 8 with the 

universe r*(j); CT) and the family T^{p; g), for all interesting g's. There are at most 2'^ choices of 
q, bounding the size of the family. Then, the lemma guarantees that the data structure can publish 
a set of 0{ -v^o" • w) cells which contains at least one cell from each interesting set. This means that 
each interesting q can be handled successfully by the algorithm. 

We handle the third type of subproblems, those with |r*(p;0*)| > a, in a global fashion. 
Consider all "interesting" pairs (i,g) with \r(p;£l\q)\ > a^-^/'^. We now apply Lemma 8 with 
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the universe consisting of all ka cells, and the family being r'(p;£j*,g), for interesting {i,q). The 
cardinality of the family is at most 2"', since i and q form a query, which takes at most one word. 
Then by Lemma 8, the data structure can publish a set of 0{k-\/a ■ w) cells, which contains at 
least one cell from each interesting set. With these cells, the algorithm can handle successfully all 
interesting {i,q) queries. 

The total number of cells that we publish is 0{k ^/a•w). Thus, we publish 0{k i^-w'^) new bits, 
plus 0{k) bits from the assumed solution to 0^ f^^\ For big enough C, this is at most kyfa ■ C'uP' . 

3.2 An Analysis of /C*): Proof of Lemma 7 

Our analysis has two parts. First, we ignore the help given by the published bits, by assuming they 
are constantly set to some value p. As r* and Q' are chosen randomly, we show that the conditions 
of (4) are met with probability at least ^ times the success probability for subproblem i. This is 
essentially a lower bound on £*, and hence on e* . 

Secondly, we show that the published bits do not really affect this lower bound on e*. The 
intuition is that there are two few published bits (much fewer than k) so for most subproblems 
they are providing no information at all. That is, the behavior for that subproblem is statistically 
close to when the published bits would not be used. Formally, this takes no more than a (subtle) 
application of Chernoff bounds. The gist of the idea is to consider some setting p for the published 
bits, and all possible inputs (not just those leading to p being published). In this probability space, 
e* are independent for different i, so the average is close to e* with overwhelmingly high probability. 
Now pessimistically assume all inputs where the average of is not close to e* are possible inputs, 
i.e. input for which p would be the real help bits. However, the probability of this event is so small, 
that even after a union bound for all p, it is still negligible. 

We now proceed to the first part of the analysis. Let 6^{p) be the probability that the query algo- 
rithm is successful when receiving a random query for subproblem i. Formally, 5*(p) = Pr[r*(p; q) ^ 
I i = i]. We define 5^{p \ E),d''{p | X) and S*{-) similar to the functions associated to e*. Observe 
that the probability of correctness guaranteed by assumption is 5 = Er,Q,d[<^*(p(r, Q, d) | r, Q, d)]. 

Lemma 9. For any i and p, we have e*(p) > — 

Proof. Let us first recall the random experiment defining £*(p). We select a uniformly random 
r G [h] and random qi, . . . , qr-i- First we ask whether r*(p; qi, . . . , qr-i) = 0- If not, we ask about 
the probability that a random qr is good, in the sense of (4). Now let us rephrase the probability 
space as follows: first select qi,...,qh at random; then select r G [h] and use just as 
above. The probability that the query ((/i, . . . ,qh) is handled successfully is precisely S^{p)- Let's 
assume it doesn't. Then, for any r, r*(p; qi, . . . , g^-i) / because there is at least one suffix which 
is handled successfully. We will now show that there is at least one choice of r such that is good 
when the prefix is gi, . . . , g^-i- When averaged over gi, . . . , g^-i, this gives a probability of at least 

h 

To show one good r, let (f)^ = min{|r*(p; gi, . . . , gr-i)|, f}. Now observe that ^ ^th^ ~ 
^ < f^i < CT- By the pigeonhole principle, (3)r : < a^^'^. This implies \r^{p;qi, . . . ,qr)\ > 

min{a,|r'(p^l....,g.-l)| ^ □ 

Note that if the algorithm uses zero published bits, we are done. Thus, for the rest of the 
analysis we may assume ^(f )^fc > 1- We now proceed to the second part of the analysis, showing 
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that e* is close to the lower bound of the previous lemma, even after a union bound over all possible 
published bits. 

Lemma 10. With probability at least 1 — over random r, Q and d: (V)p : £*{p \ r, Q,d) > 
s*{p) _ A 

h 4h 

Proof. Fix p arbitrarily. By definition, £*{p | r, Q, d) = ^ J2i I Q' Lemma 9, E[£'(p | 

r,Q,d)] =£i{p)>^, which implies e*{p) > Thus, our condition can be rephrased as: 



i5^£Xp|r,Q,d)>E i^£Xp|r,Q,d) 



k 



4h 



Now note that e*(p | r, Q, d) only depends on r*, Q* and d*, since we are looking at the behavior of 
a query to subproblem i for a fixed value of the published bits; see the definition of £* in (4). Since 
(r*, Q*, d*) are independent for different i, it follows that e*(p | r, Q, d) are also independent. Then 
we can apply a Chernoff bound to analyze the mean e*{p \ r, Q,d) of these independent random 
variables. We use an additive Chernoff bound [2]: 



Pr 

r,Q,d 



e*ip\r,Q,d)<e*ip)-^ 



< e 



Now we take a union bound over all possible choices p for the published bits. The probability of 
the bad event becomes 2c (s) ^6"*^*^*^^^ For large enough C, this is exp(— r2((|)^A;)), for any S 
and h. Now we use that > 1, from the condition that there is at lest one published bit, so 

this probability is at most e^^(*^'^/^\ Given that | > 1, this is at most ^ for large enough C. □ 

Unfortunately, this lemma is not exactly what wc would want, since it provides a lower bound 
in terms of S*{p). This probability of success is measured in the original probability space. As we 
condition on r, Q and d, the probability space can be quite different. However, we show next that 
in fact S* cannot change too much. As before, the intuition is that there are too few published bits, 
so for most subproblems they are not changing the query distribution significantly. 

Lemma 11. With probability at least 1 — f over random r, Q and d: (V)p : | r, Q,d) < 

Proof. The proof is very similar to that of Lemma 10. Fix p arbitrarily. By definition, S*{p \ r, Q, d) 
is the average of 5^{p \ r, Q, d). Note that for fixed p, 6^ depends only on r*, Q* and d*. Hence, the 
(5* values are independent for different i, and we can apply a Chernoff bound to say the mean is 
close to its expectation. The rest of the calculation is parallel to that of Lemma 10. □ 



We combine Lemmas 10 and 11 by a union bound. We conclude that with probability at least 
S 
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1 — I over random r, Q and d, we have that (V)^?: 



<S'(!>|r,Q,d)<r{p) + | J ^''1 ' fc - 2/1 

Since this holds for all p, it also holds for p = p, i.e. the actual bits p(r, Q, d) pubhshed by 
the data structure given its input. Now we want to take the expectation over r, Q and d. Because 
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£*(•), 5*{-) G [0, 1], we have £*(•) — > — -^^ We use this as a pessimistic estimate for the cases 

of r, Q and d where (6) does not hold. We obtain: 
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^ E[£*(p|r,Q,d)] >^E[,5*(p|i 

4 Communication Lower Bounds 
4.1 Protocol Manipulations 

To obtain our improved lower bounds for large space, we use two-party communication complexity. 
In this section, we state the protocol manipulation tools that we will use in our proof. We allow 
protocols to make errors, and look at the error probability under appropriate input distributions. 
Thus, as opposed to our lower bounds for small space, we also obtain lower bounds for randomized 
algorithms with bounded error. We define an [A; mi, 1712, ms, . . . ]-protocol to be a protocol in which 
Alice speaks first, sending mi bits. Bob then sends m2 bits, Alice sends 7713 bits and so on. In a 
[B; nil, n^2, ■ ■ ■ ]-protocol, Bob begins by sending mi bits. 

For a communication problem f : A x B ^ {0, 1}, define a new problem f^-^^") in which Alice 
receives xi,. . . ,Xk G A, Bob receives y & B,i G [k] and xi, . . . ,Xi-i, and they wish to compute 
f{xi, y). This is similar to our definition for except that we need to specify that Alice's input is 
being multiplied. We define f^'^^^ symmetrically, with the roles of Alice and Bob reversed. Finally, 
given a distribution D for /, we define and V^'^''^ following our old definition for D^''' . 

The first tool we use is round elimination, which, as mentioned before, has traditionally been 
motivated by predecessor lower bounds. The following is a strong version of this result, due to [16]: 

Lemma 12 (round elimination [16]). Suppose f^''^'^ has an [A;mi,m2, ■ ■ - j-protocol with error 
probability at most e on V-^'^^^ Then f has a [B;m2, ■ ■ .]-protocol with error probability at most 
e + 0(v^) onV. 

As opposed to previous proofs, we also bring message compression into play. The following is 
from [5], restated in terms of our /"^'C^) problem: 

Lemma 13 (message compression [5]). Suppose /W'"^ has an [A;mi,m2, ■ ■ ■] protocol with 
error probability at m,ost e on T>^'^^\ Then for any 5 > 0, f has an [A; 0{ ^^^^^^''^ ),m2, ■ ■ ■]- 
protocol with error probability at most e + 6 onV. 

Since this lemma does not eliminate Alice's message, but merely reduces it, it is used in conjuc- 
tion with the message switching technique [5]. If Alice's first message has a bits, we can eliminate 
it if Bob sends his reply to all possible messages from Alice (thus increasing his message by a factor 
of 2"), and then Alice includes her first message along with the second one (increasing the second 
message size additively by a): 

Lemma 14 (message switching). Suppose f has an [A;mi,m2,m3,m4, . . .]-protocol. Then it 
also has a [B; 2"^^m2,mi + ms, m^, . . .]-protocol with the same error complexity. 

Message compression combined with message switching represent, in some sense, a generaliza- 
tion of the round elimination lemma, allowing us to trade a smaller k for a larger penalty in Bob's 
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messages. However, the trade-off does yield round elimination as the end-point, because message 
compression cannot reduce Alice's message below for any k. We combine these two lemmas 

to yield a smooth trade-off (with slightly worse error bounds), which is easier to work with: 

Lemma 15. Suppose f^^^ has an [A; mi, 1712, m^, 777,4, • • - j-protocol with error e on P^'W. Then for 
any 6 > 0, f has a [B; 2^^'^^^^^^'^^^m2, mi + m^, 777,4, • • • ]-protocol with error probability e + 6 on D. 

Proof. If ^ < (5^, we can apply the round elimination lemma. Then, Alice's first message is 
ommitted with an error increase of at most 6. None of the subsequent messages change. If ^ > 5^, 
we apply the message compression lemma, which reduces Alice's first message to 0(i±(^) bits, 
while increasing the error by S. Since ^ > (5^, the bound on Alice's message is at most O(^). 
Then, we can eliminate Alice's first message by switching. Note our bound for the second message 
from Alice is loose, since it ignores the compression we have done. □ 

4.2 Application to Predecessor Search 

Theorem 16. Consider a solution to colored predecessor search in a set of n i-bit integers, which 
uses space 77-2" in the cell-probe model with cells of w bits. If a = Q{lgn) and the query algorithm 
has an error probability of at most |, the query time must satisfy: 

\ 1^"' ' lglg(£/a)+lg(a/lg77)/; 

Proof. We consider the communication game in which Alice receives the query and Bob receives the 
set of integers. Alice's messages will have lg(r7 • 2") = 6(a) bits, and Bob's w bits. The structure of 
our proof is similar to the application of the cell-probe elimination lemma in Section 2. By Lemma 2, 
we can identify the structure of P(77, ^)'^'('*) in P{n,h£). Then, we can will apply our Lemma 15 
to eliminate Alice's messages. Now, we use Lemma 3 to identify the structure of P{n,i)^'^^^ in 
P(rt -tjC + lgt). Note that P(n, ^)-^'(*) is syntactically equivalent to our old 0* P{n, £), except that 
Alice also receives a (useless) prefix of Bob's input. Now we apply the round elimination lemma to 
get rid of Bob's message. 

Thus, after eliminating a message from each player, we are left with another instance of the 
colored predecessor problem, with smaller n and £ parameters. This contrasts with our cell-probe 
proof, which couldn't work with just one subproblem, but needed to look at all of them to analyzing 
sharing. Our strategy is to increase the error by at most in each round of the previous argument. 
Then, after T steps, we obtain an error of at most 5 + 5 < 5- Assuming we still have 77 > 2 and 
^ > 1, it is trivial to make the answer to the query be either red or blue with equal probability. 
Then, no protocol with zero communication can have error complexity below ^, so the original 
cell-probe complexity had to be greater than T. 

As explained in Section 2.3, the proof should be interpreted as an inductive argument in the 
reverse direction. Assuming we have a distribution on which no protocol with i rounds can have 
error less than e, our argument constructs a distribution on which no protocol with i + 1 rounds 
can have error less than e — At the end, we obtain a distribution on which no protocol with T 
rounds can have error i, implying the cell-probe lower bound. 

It remains to define appropriate values h and t which maximize our lower bound T by the above 
discussion. After step i, Alice's message will have size (i -|- 1) • (a -|- lg77) = 0{aT), because we have 
applied message switching i times (in the form of Lemma 15). Applying Lemma 15 one more time 
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with S = we increases Bob's next message to w ■ 2'^('^^*/'*). Wc now apply round elimination 
to get rid of Bob's message. We want an error increase of at most adding up to at most ^ 
per round. Then, we set t according to: 
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Let rii and £i denote the problem parameters after i steps of our argument. Initially, ng = n, = 
By the discussion above, we have the recursions: fj+i = — Ig t and Ig n^+i = Ig = Ig — Ig 
We have Igt = 0(lgi(; + IgT + ^^). Since we want > 1, we must have T < lg£ < Igw, so 
\gt simplifies to 0(\gw + ^^). Now the condition ut >2 implies the following bound on T: 

lgn-1 f . fig"- hlgn 

1 < =^ = W mm 



To analyze the condition It ^ 1, we apply the recursion bound of Lemma 6, implying T < 
^gfe( e(igf) )" ^^^^ satisfied if we upper bound Igt by 0((a + Igui) • (1 + -7^)), and set: 

^^g('M=±felzM^Uef!5<^Vo(igr) ^ T<eC^^] 

\ J \ Igh J y Igh J 

Thus, our lower bound is, up to constant factors, min{]^, First we argue 

that we can simplify a + lgw to just a in the last term. If a = fl{lgw), this is trivial. Otherwise, 
we have a = 0(lgu;), so Ign = 0(lgu;). But in this case the first term of the min is 0(1) anyway, 
so the other terms are irrelevant. 

It now remains to choose h in order to maximize the lower bound. This is achieved when 
^ = so we should set \gh = e(lglgf + Igj^ + IgT). The IgT term can be ig- 

nored because T = O(lg^). With this choice of h, the lower bound becomes, up to constants, 

™™tlg«,> lglg(£/a)+lg(a/lgn)-r- '-' 



5 Upper Bounds 

We are working on the static predecessor problem where we are first given a set y of n keys. The 
predecessor of a query key x in y is the largest key in Y that is smaller than or equal to x. If x is 
smaller than any key in Y, its predecessor is —00, representing a value smaller than any possible 
key. Below each key is assumed to be a non-negative ^-bit integer. We are working on a RAM with 
word length w > I. The results also apply in the stronger external memory model where w is the 
bit size of a block. The external memory model is stronger because it like the cell-probe model 
does not count computations. 

For n < s and Ig ri < £ < u;, we will show represent n l-hli keys using 0{stj bits of space where 
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s >n. With a = Ig we will show how to search predecessors in time 



\lgu; ' 

^ ^^£-]gn i = o{lgn) (8) 



O I I if a>lgn and £ = a;(lgn) (9) 



if a<lgn and £ = a;(lgn) 



(10) 



Contents Below, we first obtain (7) using either B-trees or the fusion trees of Predman and 
Willard [10]. Next we use (7) to increase the space by a factor w so that wc have 0(2"£) bits of 
space available per key. Then we prove (8) by a slight tuning of van Emde Boas' data structure [19]. 
This bound is tight when w = O(lgn). Next, elaborating on techniques of Beame and Fich [4], we 
will first show (9) and then (10) in the case where w >2\gn. 



5.1 Preliminaries 

In our algorithms, wc will assume that w, I, and a arc powers of two. For the word length w, we note 
we that can simulate up to twice the word length implementing each extended word operation with 
a constant number of regular operations. Hence, internally, our algorithms can use a word length 
rounded up to the nearest power of two without affecting the asymptotic search times. Concerning 
the parameters £ and a, we note that it does not affect the asymptotics if they change by constant 
factors, so we can freely round i up to the nearest factor of two and a down to the nearest factor 
of two, thus accepting a larger key length and lesser space for the computations. 

The search times will be achieved via a series of reductions that often reduce the key length i. 
We will make sure that each reduction is by a power of two. 

We will often allocate arrays with m entries, each of i bits. These occupy mi consecutive 
bits in memory, possibly starting and ending in the middle of words. As long as i < w, using 
simple arithmetic and shifts, we can access or change an entry in constant time. In our case, the 
calculations are particularly simple because i and w are powers of two. 

We will use product notation for concatenation of bit strings. Hence xy or x ■ y denotes the 
concatenation of bit strings x and y. As special notation, we define — oo ■ x = x ■ — oo = — oo. 

Finally, we define log = log2. Note that this is different from Ig which is the function used in 
our asymptotic bounds. 



5.2 Fusion or B-trees 

With Fredman and Willard's fusion trees [10], we immediately get a linear space predecessor search 
time of O (fej)- li w < this implies the O {^^) search time from (7). Otherwise, we use a 
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B-trcc of degree d = w/L We can pack the d keys in a singe word, and we can then search a B-tree 
node in constant time using some of the simpler bit manipulation from [10]. This gives a search 

time of O {^^^ which is O for w > ^ . Thus we achieve the search time from (7) using only 

linear space, or 0{nt) bits. 

We note that the B-tree solution is simpler in the external memory model where we do not 
worry about the actual computations. 

5.3 Inceasing the space 

We will now use (7) to increase the space per key by a factor w. Wc simply pick out a set Y' of 
n' = [n/tuj equally spaced keys so that we have a segment of less than w keys between consecutive 
keys in Y' . We will first do a predecessor search in F', and based on the result, do a predecessor 
search in the appropriate segment. Since the segment has less than w keys, by (7), it can be 
searched in constant time. 

Thus we are left with the problem of doing a predecessor search in Y' . For this we have 0{s€) 
bits, which is 0{sl/{n/w)) = 0{sw£/n) = 0{2"-£) bits per key. Moreover we note that replacing n 
by n' = In/w] does not increase any of the bounds (8)-(10). Hence it suffices to prove the these 
bounds (8)-(10) assuming that we 0(n2"£) bits of space available. 

5.4 A tuned van Emde Boas bound for polynomial universes 

In this section, we develop a tuned version of van Emde Boas's data structure, representing n keys 
in O {n2°'£) bits of space providing the search time from (8) of 

We shall only use this bound for polynomial universes, that is, when i = O(lgn). 

5.4.1 Complete tabulation 

The static predecessor problem is particularly easy when we have room for a complete tabulation 
of all possible query keys, that is, if we have 0{2^£) bits of space. Then we can allocate a table 
predy that for each possible query key x stores the predecessor predyix] of xinY. If x < miny, 
predyix] = — oo. For our bounds, we will use this as a base case if ^ < a. 

Note that this simple base case is a prime example of what we can do when not restricted to 
comparisons on a pointer machine: we use the key as an addrees to a table entry and get the answer 
in constant time. 

5.4.2 Prefixes tabulation 

If our keys are too long for a complete tabulation, but not too much longer, it may still be relevant 
to use tabulation based on the first p bits of each key. Below it is understood that the prefix of 
a key is the first p bits and the suffix is the last £ — p bits. Let Suffy[u] be the suffixes in Y of 
keys with prefix u. Also let predy [u] to denote the strict predessor in F of -u suffixed by zeroes. 
Here by strict predecessor, we mean an unequal predecessor. If no length p prefix in Y is smaller 
than u, predy [u] = — oo. The representation of Y now consists of the table that with each prefix 
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u associates predp[u] and a recursive representatation of Suffyiu]. Note that if Suffylu] = 0, the 
recursive representation returns — oo on any predecessor query. 
We now have the following pseudo-code for searching Y: 

Pred{x, Y) 

{xo,xi) = {prefix{x),suffix{x)) 
yi = Pred{xi, Suffyixo]) 
if yi = — oo then return predylxo]. 
return xq ■ yi 

Note in the above pseudo-code that the produre termintes as soon as it executes return statement. 
Hence the last statement is only executed if yi / — oo. Also, as a rule of thumb, we use square 
brackets around the argument of a function that we can compute in constant time.. 

We shall use this reduction with p Ign as the first step of our predecessor search. More 
precisely, we choose p < Ign such that the reduced length i — p is a power of two less than 
2(£ — Ign). The reduction adds a constant to the search time. It uses 0(2^£) = 0{ni) bits of space 
on the tables over the prefixes. The suffix of a key y appears in the subproblem Sujfyiu]- Hence 
the subproblems have a total of n keys, each of length less than 2{£ — Ign). 

5.4.3 Van Emde Boas' reduction 

The essential component of van Emde Boas' data structure is a reduction that halves the key length 
£. Trivially this preserves that key lengths are powers of two. To do this halving, we would like 
to use the reduction above with prefix length p = £/2. However, if £ is too large, we do not have 
0{2P£) bits of space for tabulating all prefixes. As a limited start, we can use hashing to tabulate 
the above information for all prefixes of keys in Y. Let U be the set of these prefixes. For all u £ U, 
as above, we store predyiu] and a recursive representation of Suffyiu]. We can then handle all 
queries x with a prefix in U. However, if the prefix xq of x is not in u, we need a way to compute 
pred^[xo]. 

To compute predy [xq] for a prefix xq not in U, we use a recursive representation of U. Moreover, 
with each u £ U, we store the maximal key maxy[u] in Y with prefix u. Moreover, we define 
maxy[—oo] = — oo. We now first compute the predecessor yo of xq in U, and then we return 
maxy[yo]. 

The above reduction spends constant time on halving the key length but the number of keys 
may grow in that a key x = xqXi G Y has xq in the subproblem U and xi in the subproblem 
Suffy[xo]. A general solution is that instead of recursing directly on a subproblem Z, we remove 
the maximal key treating it separately, thus only recursing on Z~ = Z \ {maxZ}. 

In our concrete case, we will consider the reduced recursive subproblems Suffyiu]. We then 
have the following recursive pseudo-code for searching the predecessor of x: 

Pred{x, Y) 

{xo,xi) = {prefix{x),suffix{x)) 
ii Xq ^ U then return maxy[Pred{xo, U)] 
if X > maxy[xo] then return maxy[xo] 
yi = Predixi, Suff^[xQ]) 



21 



if yi = — oo then return predp[xo] 
return xoyi 

The key lengths have been halved to i' = i/2. We have n — half keys in the suffix subproblems 
Suffyi^o]! s-iid \U\ half keys in the prefix subproblcm U, so the total number of keys is n. 
As described above, the space used by the reduction is 0{n£) bits. 

5.4.4 The final combination 

To solve the predecessor search problem in 0{n2^l) bits of space, we will first tabulate a prefix of 
length p <\gn ss described in Section 5.4.2, thus reducing the key length to ^ — p < 2(£ — Ign) 
whish is a power of two. Then we apply the van Emde Boas reduction recursively as described in 
Section 5.4.3, until we get down to a key length below a. This requires Ig recursions. We do 
not recurse on empty subproblems. For these we know that the predecessor is always 0. Finally we 
use the complete tabulation on each subproblcm as described in Section 5.4.1. 

Since each reduction adds a constant to the search time, the search time of our solution is 
0(lg ^-^). The first reduction uses 0{nt) bits of space, and the last uses 0{2"'i) bits of space per 
subproblcm. Since the subproblems are non-empty, this is 0{n2^t) bits of space in total. Each van 
Emde Boas recursion uses 0{n£) bits of space where £ is the current key length. Since £ is halved 
each time, the space of the first iteration with the original key length £ dominates. Thus we have 
proved: 

Lemma 17. Using 0{n2"'£) bits of space, we can represent n £-bit keys so to we can search prede- 
cessors in 0(lg time. 

5.5 Reduction a la Beame and Fich 

In this section, we will derive better bounds for larger universes using a reduction very similar to one 
used by Beame and Fich [4] . Our version of the reduction is captured in the following proposition: 

Proposition 18. Let be given an instance of the static predecessor search problem with n keys of 
length £. Choose integer parameters q > 2 and h > 2 where h divides £. We can now reduce into 
subproblems, each of which is easier in one of two ways: 

length reduced The key length in the subproblcm is reduced by a factor h to £/h, and the sub- 
problem contains at most half the keys. 

cardinality reduced The number of keys is reduced by a factor q to n/q. 

The reduction costs a constant in the query time. For some number m determined by the reduction, 

the reduction uses 0{{q^'^^'^ + m)£) bits of space. The total number of keys in the cardinality reduced 
subproblems is at most n — m, and the total number of keys in the length reduced subproblems is at 
most m. 

The original reduction of Beame and Fich [4, Section 4.2] is specialized towards their overall 
quadratic space solution, and had an assumption that £ < £/h. They satisfy this assumption by 
first applying van Emde Boas' reduction \gh times. This works fine in their case, but here we 
consider solutions to the predecessor search problem where we get down to constant query time 
using large space, and then their assumption would be problematic. 
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5.5.1 Larger space 

Recall that we are looking for a solution to the predecessor problem using 0{n2"'£) bits of space. 
In our first simple solution assumes a > Ign and £ = u;(lgn). With some h to be fixed later we 
apply Proposition 18 recursively with q fixed as 2"/^^'*). Here a and h are assumed powers of two. 
Then the bit space used in each recursive step is 0(2"£). Since no subproblem has more than half 
the keys, the recursion tree has no degree 1 nodes. Hence we have at most n — 1 recursive nodes, 
so the total space used in the recursive steps is 0{n2°'i) bits. 

As described in Section 5.4.1, we can stop rccursing when wc get down to key length a, so 
the number of length reductions in a branch is at most Ig/^ ^. On the other hand, the number of 
cardinality reductions in a branch is at most lg2a/{2h) n = HMkli) Thus, for n < s, the recursion 
depth is at most 



Igf ^2Mlgn) 



This expression is minimized with 



\ ign ign J 



and then we get a query time of 

O 



Ig- 

1 alg- 



(Ig n) 

Except for the division of tt; by a in ^, the above bound is equivalent to one anticipated without 
any proof or construction in [18]. we shall prove that this bound is tight. 

5.5.2 Smaller space 

We now consider the case where we start with a problem with Ign > a/2 and ^ = a;(lgn). We are 
now going to appy Proposition 18 recursively with a fixed value of h which is a power of two, but 
with a changing value of q, stopping when we get a STibproblcm with only one key, or where the 
key length is at most a. While Ign > a/2, we use Proposition 18 recursively with q = [n^/^^^^^J. 
However, when we get down to n < 2"/^ keys, we use q = 2"/ ^^^^ . 

Lemma 19. The above construction uses 0(n2"l) bits of space and the search time is 



O 



Proof. First we analyze the search time which is the recursion depth. Since we start with key 
length at most £ and finish if we get to a, the number of length reductions in a recursion branch is 

o(ig.i). 

For the cardinality reductions, while n > 2"/^, we note that it takes less than 4h reductions to 
get from n to ^/n keys. More precisely, in each of these reductions, we have q > y'n^^^^'*^ , and then 
it takes less than 

n lg^/n 
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cardinality reductions to get down to ^Jn keys. Thus it takes \h cardinality reductions to half 
logn, so to get from the original value and down to a/2, we need at most 4/i[lg = 0(/i Ig 
cardinality reductions. In the above argument, we have ignored that q is rounded down to the 
nearest integer. However, since /i is a power of two, we can use the same argument to show that 
we can have at most 4/i iterations while logn G [2*, 2'"'"^). 

Finally, starting from n < 2°^/^ keys and using q = 2'^^^'^^\ we use at most h cardinality 
reductions to get down to a single key. Thus, the total number of cardinality reducing reductions 
is 0{h\g It follows that the total recursion depth is at most 

yigh a J 

The search time stated in the lemma is obtained setting 

1 1 CC T7. / ^ 



1 Ign / 1 „ Ign ■ 
° a o o 

The bit space bound in each reductive step is 0{{q^'^^^ +m)£). We will add up each term separately 
over the whole recursion tree. 

For the 0{mi) bound, we note that at least m keys get reduced to length i/h < i/2. Thus, 
the total bit length of the keys is reduced by at least mi/2, so we use 0(1) bits per key bit saved. 
Starting with n£ key bits, the total space used is 0{n£). 

Finally, concerning the 0(g(2/i)£) bound, we have to cases. When q = l2(^sV^)/m^^ ^j^g ^it 
space used is 0{^/nt). This is 0{l/ \/n) bits of space per key in the recursion. Following a key x 
down the branch, we know that the number of keys is halved in each step, and this means that the 
space assigned to x is increased by a factor \/2. Thus, the total space assigned to x is dominated 
by the last recursion, hence 0{t) bits. Thus, over all the keys, we get 0{nl) bits of space for this 
case. 

Finally, when q = 2^/(2'*), the bit space is 0(2°^^), and then the at most n — 1 recursive nodes 
give a bit space bound of 0{n2"-£). Thus the whole thing adds up to 0{n2"'£) bits of space, as 
desired. □ 



5.5.3 Proof of Proposition 18 

In this section, we prove Proposition 18: 

Let be given an instance of the static predecessor search problem with n keys of length I. 
Choose integer parameters q > 2 dividing n and h > 2 dividing £. We can now reduce 
into subproblems, each of which is easier in one of two ways: 

length reduced The key length in the subproblem is reduced by a factor h to i/h, and 
the subproblem contains at most half the keys. 

cardinality reduced The number of keys is reduced by a factor q to n/q. 

The reduction costs a constant in the query time. For some number m determined by 
the reduction, the reduction uses 0{{q^'^^'^+m)l) bits of space. The total number of keys 
in the cardinality reduced subproblems is at most n — m, and the total number of keys 
in the length reduced subproblems is at most m. 
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In the proof below wc will ignore the requirement that a length reduced subproblem should contain 
at most half the keys. If one of these subproblems ends up with two many keys, we can just split 
it around the median, adding only a constant to the search time. 

We will view each key x as a vector xi ■ ■ ■ Xh of h characters, each of c = £/h bits. We now 
provide an alternative to the parallel hashing in [4, Lemma 4.1]. The most significant difference is 
that our lemma does not require a word length that is h times bigger than i. Besides, the statement 
is more directly tuned for our construction. 

Lemma 20. Using 0{(p'^tj hits of space, we can store a set Z = {z^ , ■■■,z'^} of q h-character keys 
so that given a query key x, we can in constant time find the number of whole characters in the 
longest common prefix between x and any key in Z . 

Proof. Andersson et al. [3, Section 3] have shown we in constant time can apply certain universal 
hash functions Hi,...,Hh in parallel to the characters in a word, provided that the hash values are 
no bigger than the characters hashed. Thus, for each i independently, and for any two different 
characters a: 7^ y, if the hash values are in [m], then Pr[if,j(.T) = Hi{y)] < 1/m. Given x = xi - ■ ■ x^, 
we return Hi{xi) ■ ■ ■ Hhixh) in constant time. However, the hashed key has the same length as the 
original key. More precisely, if the characters have c bits and the hashed characters are in [2''] , then 
we have c — b leading zeros in the representation of Hi{xi). 

We will map each character to b = 2\gq bits. We may here assume that b < c, for otherwise, 
we can tabulate all possible keys in q'^^£ bits of space. For each character position i, we have q 
characters zj, and for random Hi these are all expected to hash to different values. In particular, 
we can choose an without collisions on {zj}i<j<q. Now if Xi = zj we have Hi{xi) = Hi{zf) and 
there is no zj / zj with Hi{xi) = Hi^zj ). 

Next, consider the set A of values Hi{xi) ■ ■ ■ Hh{xh) over all possible vectors x = Xi---Xh. 
These vectors are ch long, but since only the b least significant bits are used for the hash values of 
each character, there are at most 2'''* different values in A. Using the linear space 2-level hashing 
of Fredman et al. [8], we construct a hash table 7i over A using 0{2^^i) bits of space. With the 
entry Ti.{Hi{xi) • • • Hh{xh)), we store the key z^ so that Hi{z-[) ■ ■ ■ Hh{zj^) has the longest possible 
prefix with Hi(xi) ■ ■ ■ Hh{xh)- The key z^ is found from x in constant time. 

We now claim that no key z^ can agree with x in more characters than z^ . Suppose for a 
contradiction that z^ agrees with x in the first r — 1 characters but not in character r, and that z^ 
agrees in the first r characters. Then Hi{xi) ■ ■ ■ Hr{xr) = Hi{zl ) • • • Hr{zr )■ However, sinc6 Hj- is 

1-1 on {zl.}l<j<q, Hr(zt) / Hr{zi ) = Hr{Xr). 

All that remains is to compute the number of whole characters in the common prefix of x and 
z^ . This can be done by clever use of multiplication as described in [10] . A more practical solution 
based on converting integers and to floating point numbers and extracting the exponent is discussed 
in [17]. □ 

Using Lemma 20, we can compute in constant time the longest common prefix, comm-prefz[x\^ 
in whole characters, between x and any key in Z. Also, if x we can get the prefix comm-pref^[x] 
that has one more character from x. 

We are now return to the proof of Proposition 18 which is similar to the one in [4, Section 4]. 
Out of our original set y of n keys, we pick a subset Z = {z^, z''} of q keys so that there is a 
key from Z among any sequence of [n/g] consecutive keys from Y. We apply Lemma 20 to Z. 
Thereby we use 0{q^^tj bits of space. We are going to consider two types of subproblems. 
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Cardinality reduced problems First we have the cardinahty reduced subproblems. These are 
of the following type: we take a key from Y \ Z and consider the prefix v = comm_ pref~^ {y , Z). 
Let Agreeylv] denote the keys from Y that have prefix v. These keys are consecutive and they 
do not contain any key from Z, so |j43reey[f]| < q. We use 2-level hash table for the prefixes in 
V = {coram- pref^ {y, Z)\y G Y \ Z}. With f G F, wc store predy [f] and maxY[v\ as defined in 
the previous section, that is, predyiv] is the strict predecessor in y of suffixed by zeros, and 
maxY{v\ is the largest key in Y with prefix v. Finally, as the cardinality reduced subproblem, we 
have Agreeylv] = Agreey[v] \ {max^^recy [i;]}. 

This above informatoin suffices to find the predecessor of any query key x with comm- pref~^ {x , Z) = 
V. The bit space used above is 0(|y|^). Each cardinality reduced subproblem ^^reey[f]| has at 
most q — 2 keys, and they add up to a total of n — \Z\ — \V\ keys. 

Length reduced subproblems For query keys x with comm-pref'^{x, Z) ^ V, we will consult 

length reduced subproblems defined over the set U of prefixes of keys in Z. We will have a 2- 
level hash table over U. For each u & U, let Next- chary[u\ be the set of characters c such that 
uc is a prefix of a key in Y. We will have a length reduced subproblem over the characters in 
Next- charylu] = Next- chary[u\ \ {max Next- charylu]} . As complimentary information, we store 

predy[v] and maxy[v]. 

Now, consider a query key x with comm-pref~^{x, Z) V. Let u = comm-pref{x, Z) and let 
d be the subsequent character in x, that is, ud = comm-pref~^{x, Z). Then d ^ Next-chary[u]. 
Suppose X is between the smallest and the largest key in Y with prefix u. If c is the predecessor 
of d in Next- chary [u], then the predecessor of x in y is the largest key with prefix uc. However, 
uc G V, so the predecessor of x is the maxy[uc] stored under the length reduced subproblems. 

The above length reduction used 0{l) bits for each u e U and c G Next- chary[u]. Consider 
c G Next- chary[u]. There can be at most U cases where uc G U. Otherwise, we have uc = 
comrri- pref~^ {y , Z) G V. The total bit space of the length reduction is hence 0((|C/| + 

We will now prove that the total number of keys in the length rediiccd subproblems Next- chary [u] 
is at most \Z\ + \V\. Above we saw that if a character c G Next- chary[u] did not represented a 
prefix in U, it represented a prefix in V. Those representing prefixes in U can also be viewed as 
representing children in the trie over Z. The total number of such children is at most \Z\ plus the 
number of internal trie nodes, and since we for Next- charylu] subtracted a node for each inter- 
nal trie node u, we conclude that the total number of keys is the length reduced subproblems is 
bounded by \Z\ + \V\. 

Pseudo-code We now have the following recursive pseudo-code for searching the predecessor of 
X in Y: 

Pred{x, Y) 

if X e Z return x 

let ud = comrri- pref^ {x) with d the last character 

if ltd G y then 

if X > maxylud] then return maxy[u(J\) 
y = Pred(x, Agrecyluc]) 
if y = — GO then return predy[ud] 
return y 

if X > maxy[u] then return maxy[u] 
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c = Predyid, Next- charyiv]) 
if c = — oo then return predyiu] 
return maxyluc] 



Final analysis This almost finishes the proof. Let m= \Z\ + \ V\. Then we have at most n — m 
keys in cardinahty reduced subproblems and at most ra keys in length reduced subproblems. 

The total bit space used is 0{q^^l) for the implication of Lemma 20, 0(|y|^) for the cardinality 
reduction, and 0{{\U\ + \V\)t) for the length reduction. Here 0{\U\) = 0{hq) = 0{q^^) and \V\ < 
m, so the total bit space is 0{{q^^ + m)t). This completes the proof of Proposition 18. □ 
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